The Original Shareware 1.1

home *** CD-ROM | disk | FTP | other *** search

/ The Original Shareware 1.1 / The Original Shareware (WeMake CDs)(Volume 1.1)(CDs, Inc)(1993).iso / 32 / leastq.zip / LEASTQ.DOC < prev next >

Wrap

Text File | 1990-02-21 | 17KB | 270 lines

LEASTQ is a general purpose curve fitting program designed to find the coefficients a[i] of a function, y(x), of the form: (1) y(x) = a[0] + a[1]*f1(x) + a[2]*f2(x) + . . . . where fi(x) are some functions of x. Data for this program consists of a set of points, each one described by three numbers, x[k], y[k] and err[k], where y[k] is an experimental value measured at x[k] and err[k] is the uncertainty in this measurement. The program finds the coefficients a[i] that minimize, Chisq, the sum of the squares of the deviations of the calculated y from the measured y. (2) Chisq = Sum(k = 1;Npts) of (y[k] -ycalc(x[k]))^2/err[k]^2 where ycalc(x[k]) is y calculated from x[k] with formula (1) above. The procedure of minimizing Chisq is called least squares fitting. To call this program, type LEASTQ [MYFILE.DAT] [/E] <CR> from the command line. The parameters in square brackets are optional. MYFILE.DAT is the generic name of the file containing the data you want to fit; if this parameter is omitted, the opening screen of the program will show you a list of all the files in your current directory with the extension DAT and you can select one of these files by choosing its number. The '/E' parameter is used to force the program to use EGA rather than VGA graphics. Otherwise the program will use VGA graphics if it detects that your machine has VGA available. The reason for this option is that you may want to run LEASTQ with a graphics screen dump TSR loaded. Some of these, such as SCRNDU, do not handle VGA graphics properly. LEASTQ assumes that any file with extension 'DAT' is a data file for this program. If you have other files with this extension on the disk where you keep LEASTQ, you may want to consider moving them to another disk or directory so that you don't call them by mistake. A data file for this program consists of three columns of numbers. The first column contains the values of x, the second column the correspond- ing values of y, and the third column the errror in y. Each line in the data file, therefore, corresponds to a data point. The proper format can be ascertained from the sample data files that have been provided. Spacing of the columns is not critical and comments placed after the third column are ignored by the program. You can generate a data file with an editor or word processor, or a data file could be generated by an application program. You can also generate a data file from LEASTQ. Start the program without specifying a data file in the command line and choose 'none of the above' from the menu of .DAT files in your directory. You will be put into an editor that will allow you to enter data from the keyboard which you can subsequently save to the disk if you wish. When you enter the keyboard routine, you will be asked if your data has evenly spaced values of x; if so, enter the first x and the step between the x's. The program will then fill in the values of x for you. This can save a lot of time. If you enter the values of x yourself, you can put your data in any order; LEASTQ will sort your data in order of ascending x when you are finished. It is perfectly acceptable for two points to have the same x coordinate; this sometimes happens with real data when you go back and repeat a measurement. You will then be asked if the dependent variable, y, is a count of something: the number of beans in a jar, for instance. In that case, the program will calculate the appropriate value of the error, which is the square root of y. If you define the data as a count, you will not be allowed to enter negative values of y. If you want all data points to have the same error, enter the first error and then just <cr> for subsequent entries; the error for the current point will be taken from the previous point. All data points MUST have a finite error; any data point without an error will be ignored by the program. This is true, also, of data read from a file; points with zero error will be ignored. You can use the cursor keys to go back and correct data that was entered incorrectly. Hit <esc> to exit the data entry routine and return to the main program. You must have at least 3 valid data points to run this program and you are allowed a maximum of 1024 data points. You will exit automatically if you exceed this number. The limit applies also to data read from a file. if the program is reading data from a file with more than 1024 entries, it will exit after the 1024th data point. It is difficult to enter a large number of data points without making a few mistakes - forgetting a decimal point, for instance. When you have finished entering data from the keyboard, you will be shown a plot of your data. If you see some obvious mistakes, press <N> at the question 'Data OK?' . Count the points from the left to find the point you want to change; the first point on the left is #1. Enter the number of the point; it will be highlighted in blinking white and you will be asked to confirm that it is the point you want to change. You will be asked to replace x, y and err for the point you have chosen. Entering <CR> leaves the quantity unchanged. If there are additional points you would like to correct. press <Y> at the question 'more'; press <N> when you are finished correcting your data. Then you will be given the option of storing your data as a file for later use. Note that you do not give your file name the extension 'DAT'; the program will do that for you. A data file will be created in your current directory with the name you have given. After you have loaded your data from a file or from the keyboard, you are shown a menu of functions that can be used to fit the data. A library of commonly used functions has been provided. Select the function you wish to use by typing the highlighted letter. The appropriate function is often determined by the differential equation you want to solve and the boundary conditions. In cases where the choice of function is open, it is sometimes useful to look at a plot of the data before selecting the species of function to use. Selecting 'V' (for 'View') from the menu will display a plot of the data together with the list of functions, and your choice can be made while viewing this plot; otherwise, hitting any key will return to the select functions menu. After you have selected your function, you will be asked to supply a range over which x is to vary. For example, if your data consists of the Dow Jones Industrial Average, y, versus the year, x, over the range 1948 to 1988, you would probably want x to run over a range from 0 to 40 or perhaps 0 to 1, rather than 1948 to 1988, so that the coefficients, a[i] will have a reasonable size. Note that if you choose Powers and a range of 0 to 1.0, all coefficients will have the same weight in the calculated value of y at the maximum x; you can see at a glance if the a[i] are converging to 0 as i increases. For trigonometric functions, you will be asked the number of cycles that the range of x is to represent. Eliot wave theorists can play around with those Dow Jones averages. For Chebychev and Legendre polynomials the range of x should be -1.0 to +1.0 or smaller. For Bessel functions the lower limit must be greater or equal 0, and the upper limit should be chosen no greater than 25 because of the limitations in the algorithm used in this program to calculate Bessel functions. Next you will be asked the number of coefficients in the fit. The minimum allowable number is 2, the largest number is 12 with the restriction that the number of coefficients must be less than the number of data points. After each calculation the program will display a plot of the data fitted to the curve calculated by equation (1), the Chisq of the fit as calculated from equation (2) (the lower the Chisq, the better the fit) and ask you 'accept fit?'. If you think more coefficients are needed, answer 'N'. It is good practice to start with a small number of coefficients and increase this number until adding another coefficient produces only a small decrease in Chisq. The algorithm for calculating the a[i] involves a matrix inversion. With a large number of coefficient this inversion sometimes bombs; the bomb will place you back in the menu at the place where you choose the number of coefficients and you will just have to settle for a smaller number of coefficients. Sometimes increasing the range of x will permit you to use more coefficients. In two special cases, function species 'Exponen' and 'Gaussian', you are allowed two and three coefficients respectively; the step of choosing the number of coefficients is bypassed. If you answer 'y' to accept the fit, the program will list the coefficients, a[i], and a comparison of experimental y[k] with calculated y's for all data points, the chisq of your fit and a quantity called the confidence level, which will be discussed below. The program will then present you with a menu of options. Choose 'D' to try a new data file. Choose 'R' to try a different range in x. Choose 'S' if you want to try a different species of function. For instance, you may have fit the data to Sines and now notice that the coefficients of sin(2*x), sin(4*x) etc. are very small and have large errors; you might then try fitting with Oddsines. The most general fit to trigonometric functions is called Fourier; you can start with this and later choose the species of trigonometric function that gives the best fit to your data. Sometimes it becomes evident that your fit is not working out; the Chisq remains large or the matrix inversion bombs when you try to use an adequate number of coefficients. In this case, it is best to pretend to 'buy' the solution so as to get into a menu where you can pick a different species of function or a different range in x. The characteristics of a good fit are that the coefficients are reasonably small and converging: that is a[i+1] > a[i] and that the errors are small, at least for the first few terms. Finally, if your inability to get a good fit appears to be the result of just one or two bad points, you can edit the data so as to bring the troublemakers in line. Choose option 'F' for 'Fudge data'. You will get a plot of your data together with your most recent fit from which you can adjust points with the same procedure used to correct bad points after entering them from the keyboard. Points which have been fudged are henceforth plotted in green, whereas the original unchanged data points are plotted in red. You realize, of course, that using this procedure with actual experimental data is highly unethical, so you will have to answer to your conscience if you use this option. Because LEASTQ does not want to become your accomplice in crime, it will not allow you to save your shamelessly doctored data to disk. Choose option 'Q' to quit the program. The program will ask you if you want a printed output of your results. Hit 'Y' to get a printed record of your fit togother with a comparison of your data with the calculated values. Some data files have been provided for demonstration purposes: SCURVE Try fitting to Powers. Note that Chisq does not decrease much after 4 coefficients. Also try Bessels with a range of 0 to 1.0. SQUAREWV Try Sines with range = 1.0 cycles. Then, since the even values obviously aren't pulling their weight in the boat, try Oddsines. Note that this data does not represent a true square wave; the sides have a finite slope. A good fit to a true square wave requires more terms than this program allows. SAWTOOTH Try Sines with range = 1.0 cycles. Note that with 8 coefficients the stupid computer thinks that it has done a fantastic job of fitting the data. You, the clever human being, have to set it straight. Six coefficients is about the best fit. TRAPEZD Try Powers with a range from 0 to 1. Note that it takes many coefficients to get a good fit and that the coefficients, a[i], are very big and have big errors. Powerseries is not a good choice for extrapolation; this expression will explode outside the calculated range. Then try Oddsines with range = 0.5 cycles. EXPONEN Obviously, the first function to try is Exponen. Set the range from 0 to 60 so that you get the real decay length as coefficient a[1]. You will get a logarithmic plot which gives you a good view of the points with small y but a poor view of the points with large y. Then try Powers which will give you an arithmetic plot and a good view of the big points. Powers with 6 coefficients gives a good fit. GAUSSIAN: Try Gauss, another special case. The log of the data is fit to a parabola with only two coefficients allowed and the plot is logarithmic. Then try Oddsines with range = 0.5 cycles to see an arithmetic plot. THERMOCP Data for this were taken from the Handbook of Chemistry and Physics with very small errors assigned. Try Powers with a range of x from 0 to 27.39. (Since you intend to use the calculated coefficients in a programable pocket calculator, you use the real range of x so you can enter the coefficients directly.) The plot is not much use here; the errors are too small to see. Keep an eye on Chisq and keep adding terms until a decent fit is obtained. 6 coefficients are sufficient to give < 0.1 degree accuracy over the range of x. The first coefficient, a[0], is negligible and should be thrown away. Try also Tscheby (Chebychev Polynomials) with range 0 to 1. For most of these data files the errors were chosen arbitrarily; thus, the absolute value of chisq has no particular significance and is merely a relative indicator of goodness of fit. For data from actual experimental measurements where a reliable estimate of the uncertainty is available, the value of Chisq is significant. A rough rule of thumb is that chisq should be somewhat smaller than the number of Degrees of Freedom. A more precise evaluation of the significance of the Chisq obtained is the Confidence Level which is calculated every time you 'buy' a fit. The confidence level is a number that varies between 0 and 1. A perfect fit corresponds to a confidence level of 1.000. An extremely poor fit is described by a confidence level of 0.000. In practice, any confidence level over 0.8 is pretty good fit; with a confidence level under 0.5 you had better think about framing a new hypothesis, i.e., choosing a different species of function or range in x. When choosing between two species of functions, the confidence level rather than the value of chisq is the best indicator of goodness of fit. For example, consider once again the data file GAUSSIAN, which was created with a random number generator and statistical errors. Using function species Gaussian you will get a chisq of 23.839 for 31 degrees of freedom giving a confidence level of 0.8171. Now try species Oddsines with 12 coefficients. You will get a chisq of 19.31, which looks like a better fit, but with only 22 degrees of freedom your confidence level is 0.6259. What is happening is that with 12 coefficients to play around with LEASTQ can get cute and adjust to the noise in the data. Oddsines with only 6 coefficients is a better fit giving a confidence level of 0.7533. While the higher the confidence level, the better the fit, you should be suspicious of experimental data with too high a confidence level; it may indicate that some unscrupulous person has Fudged the data. Try fudging GAUSSIAN. With just a few changed points you can get a CL of 0.998. *************************************** This program is free and may be distributed by bulletin boards and archiving services at their usual rates. Report bugs and suggestions for improvement to the author: John D. Fox 309 NW 24th Street Gainesville, FL 32607 CIS: 71270,2304